Skip to main content

All Questions

Tagged with
4votes
0answers
23views

Trying to train ML model to do regression for US Department of Transportation Kaggle Flights Dataset with 5 million records and 7 features

For a college project for my data science course I am trying to fit a model based on the U.S. DOT's 2015 Kaggle Flight Cancellations dataset, but am not having great luck with model performance (MSE ...
Jake Malis's user avatar
2votes
0answers
20views

Preprocessing multivalue attributes in a dataframe, similar to Nominal

Description: Input is a CSV file CSV file contains columns of different data types: Ordinal Values, Nominal Values, Numerical Values and Multi Value For the multivalue columns. Minimum is 1, ...
DILF Unboxing's user avatar
3votes
1answer
31views

Looking to replace missing time series values with values from a competitor that's correlated

I have a dataset of a retailer that has the following attributes Date, Hour, Enters, Exits I have another dataset with the same attributes of a competitor that is correlated with the original dataset ...
utink's user avatar
1vote
1answer
45views

RFECV and grid search - what sets to use for hyperparameter tuning?

I am running machine learning models (all with sci-kit learn estimators, no neural networks) using a custom dataset with a number of features and binomial output. I first split the dataset into 0.6 (...
Alex's user avatar
2votes
1answer
71views

Why lightgbm .predict function has proba not between 0 and 1

I wanna understand why in this code, I get the following results: ...
Legna's user avatar
1vote
0answers
35views

How to predict price behavior according to model predictions for a week ahead?

I wrote the simplest linear regression model (I'm a noob, please don't scold me; this is my first model) to predict the price of solana, I would like to get some advice or tips on how to improve. The ...
bobrya_ziben's user avatar
0votes
1answer
550views

Python SK-Learn KNN Imputer ( "ValueError: could not convert string to float: )

I have data with missing values. All columns are integer, except for a column that has missing values. These missing values, were set with a "?" which was converted to NaN using the Numpy ...
Palu's user avatar
  • 103
0votes
1answer
31views

Best measure to inform how predicted value can differ from real one

I have trained a regression model and obtained a pandas series of the predicted values. I am working on a "calculator" that will be able to return a predicted value after entering an input ...
Paulina's user avatar
0votes
1answer
261views

Ugly AUC curves. Sklearn. How to make AUC Curves less square

I dislike the square look of this AUC curve (SKLearn). The purpose of this question is "visual". Please post code snippets. This question is not requesting the theory behind the AUC. My goal ...
Full Array's user avatar
0votes
1answer
1kviews

Is there any benefit to using cross validation from the XGBoost library over sklearn when tuning hyperparameters?

The XGBoost library has its own implementation of cross validation through xgboost.cv(). It looks like it requires data be stored as a DMatrix. Instead of using <...
Eli's user avatar
  • 101
0votes
2answers
2kviews

Does Scikit-Learn's OneHotEncoder make all Columns Categorical?

I've been using Scikit-Learn's OneHotEncoder to turn categorical data into binary columns, however, it seems that fitting ...
Connor's user avatar
0votes
1answer
164views

What Should I do with NaN Values when Transforming Categorial Data to Numerical Data?

I'm currently working on a dataset with several categorial features, which I need to transform into numerical features through one hot encoding. However, some of the features have ...
Connor's user avatar
0votes
1answer
953views

Linear Regression line not showing in plot

It's a silly problem, I know, but it's getting my nerves. Everything seems fine, but I cannot get the line to show on the plot. I've put it in a public Google notebook, for your convenience. t ...
Ignacio Guerrero's user avatar
0votes
1answer
675views

calculate sklearn metrics from 2d array

I have the following frame of actual value, ...
Ali A. Jalil's user avatar
0votes
0answers
70views

why when I find the best accuracy for logistic regression then it give me this error (AttributeError: split not found)

after run this code I face the split not found error. ...
Ibad Khan's user avatar

153050per page
close